A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

نویسندگان

چکیده

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during whole process, and future deviations are to their present values. yields a function dependent on mean, this dependency renders traditional dynamic programming methods inapplicable since it suppresses crucial property—time-consistency. To deal with unorthodox problem, we introduce pseudo mean transform untreatable MDP standard one redefined form derive performance difference formula. With propose unified algorithm framework bilevel structure for optimization. unifies variety of algorithms several variance-related problems including, but not limited to, optimizations average MDPs. Furthermore, convergence analyses missing from literature can be complemented proposed as well. Taking value iteration an example, develop prove its local optimum aid Bellman local-optimality equation. Finally, conduct numerical experiment portfolio management validate algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for oth...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Algorithmic aspects of mean-variance optimization in Markov decision processes

متن کامل

Simplex Algorithm for Countable-State Discounted Markov Decision Processes

We consider discounted Markov Decision Processes (MDPs) with countably-infinite statespaces, finite action spaces, and unbounded rewards. Typical examples of such MDPs areinventory management and queueing control problems in which there is no specific limit on thesize of inventory or queue. Existing solution methods obtain a sequence of policies that convergesto optimality i...

متن کامل

Risk-Sensitive and Mean Variance Optimality in Markov Decision Processes

In this note, we compare two approaches for handling risk-variability features arising in discrete-time Markov decision processes: models with exponential utility functions and mean variance optimality models. Computational approaches for finding optimal decision with respect to the optimality criteria mentioned above are presented and analytical results showing connections between the above op...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: European Journal of Operational Research

سال: 2023

ISSN: ['1872-6860', '0377-2217']

DOI: https://doi.org/10.1016/j.ejor.2023.06.022